Model Selection

Multimodal feature extraction

# Multimodal feature extraction

CLIP ViT L Rho50 K1 Constrained FARE2

A feature extraction model fine-tuned based on openai/clip-vit-large-patch14, optimizing the image and text encoders

Multimodal Fusion

Moonvit SO 400M

MoonViT is a native resolution visual encoder, initialized and continuously pre-trained based on SigLIP-SO-400M, suitable for image feature extraction tasks.

Image Enhancement

Vit Large Patch14 Clip 224.dfn2b

A vision transformer model based on the CLIP architecture, focused on image feature extraction, released by Apple.

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase